class: center, middle, inverse, title-slide .title[ # Be Bayesian my friend. An introduction to Bayesian inference ] .subtitle[ ## VI Jornadas Científicas de Estudiantes de la SEB ] .author[ ###
Joaquín Martínez-Minaya, 2022-09-14
VAlencia BAyesian Research Group
Statistical Modeling Ecology Group
Grupo de Ingeniería Estadística Multivariante
] .date[ ###
jmarmin@eio.upv.es
] --- class: inverse, center, middle, animated, slideInRight # What do the following events have in common? --- # Palomares Bomb .left-column3[ - In a routine .hlb[Cold War] operation on 17 January 1966 in Palomares (Almería, Spain), an incident happened. - At 10:22 a.m. that day, .hlb[two aircrafts collided over the skies of Almeria]: a KC-135 from Moron Air Base (Seville) was about to refuel a B-52 from Turkey that was returning to North Carolina. - Only .hlb[4 of the 13] crew members of both aircraft survived. - The second plane returning from Turkey contained 4 atomic bombs. And all four were going to fall squarely on Palomares in the east of Almería. - Of the .hlb[four bombs], three fell on land and could be located. Numerous witnesses assured the US Army detachment to Palomares that the .hlbred[stray bomb] had landed in the water. But, .hlbpurple[how to find it?] ] .right-column3[  The casings of two B28 nuclear bombs involved in the Palomares incident, on display at the National Atomic Museum, in Albuquerque ] --- # Alan Turing and Enigma code .left-column3[ - During the Second World War II, Alan Turing and his team worked on decrypt the .hlb[enigma code]. - Algorithm developed was called .hlb["bamburismus"]. This was used to .hlb[decrypt messages sent by the German Navy]. - The Enigma machine consisted of a keyboard, a panel where the letters lit up and .hlbred[several rotors]. - To encrypt a message, the rotors were placed in a .hlb[certain position] and the message was written and the encrypted message was displayed on the panel. - To decrypt an encrypted message, the process was symmetrical. Simply .hlb[set the rotors to the initial configuration] and type in the encrypted message, which would appear decoded on the panel. But, .hlb[how bamburismus worked to decrypt messages?] ] .right-column3[ </br>  Enigma machine ] --- # How these problems were solved? .right-column11[  ] -- <font size = "+1"> .left-column11[ .hlbpurple[Palomares Bomb] - .hlb[A prior probability was assigned to each area of the map] based on the subjective knowledge of the experts. This probability could be .hlb[updated] with incoming information on the search. - That day, .hlb[Paco was fishing 90 metres from the place where the bomb fell], and was able to act as a guide for the Americans who had been sent to Palomares. - With his help, .hlb[they were able to formulate another map based on the information he gave them]. They finally managed to locate the bomb. .hlbpurple[Alan Turing and Enigma code] - As coded messages were received, the belief about the .hlb[hypothetical machine configuration was updated]. - When the .hlb[weight of evidence in favour of a particular configuration] of the Enigma machine was sufficiently high, that configuration was considered probable. ] </font> --- # Table of Contents ## 1. A bit of history ## 2. Then..., what is Bayesian approach? ## 3. Let's predict --- class: center, middle, animated, rotateInUpRight, inverse # 1. A bit of history --- # Thomas Bayes (1701-1761) .left-column11[ - He was an .hlb[English statistician, philosopher and Presbyterian minister]. - Specifically, he was the .hlb[eldest of 7 children] born to Ann and Joshua Bayes. Joshua was one of the first seven Presbyterian reverends, a branch of the Protestant church that did not support the Anglican church. - He moved to London, but as a Presbyterian, .hlb[Bayes was unable to study theology] in that city, and had to travel to the city of Edinburgh to study theology. It seems that he studied also .hlb[Mathematics and Logic]. - Thomas was interested in studying .hlb[the most probable causes] of what was happening in order to prove God's existence and benevolence. ] .right-column11[  ] --- # The work that changed everything .left-column11[ - In this respect, one of the people Thomas most admired was .hlb[Sir Isaac Newton]. Newton was trying to prove that if .hlb[there were such rules, it was because there was a God who organized everything]. - Bayes devises a procedure by which, .hlb[starting from, zero knowledge, one can learn from observations (data, facts) to get to know what causes them]. - During his lifetime, he did not want to .hlb[publish his theorem] because he thought it was irrelevant. - It only saw the light of day when his friend .hlb[Richard Price] retrieved it from a pile of papers. Finally, it appeared in 1763, in the journal Philosophical Transactions under the name: .hlbred[An Essay toward solving a Problem in the Doctrine of Chances]. ] -- .right-column11[  ] --- # How does it work? - I know something about the hypothesis -> .hlbred[Prior distribution]: `\(Pr(Hypothesis)\)`. - I observe the data -> .hlbred[Likelihood]: `\(Pr(Data \mid Hypothesis)\)`. - I update my knowledge about my hypothesis -> .hlbred[Posterior distribution]: `\(Pr(Hypothesis \mid Data)\)`. -- </br> $$ Pr(Hypothesis \mid Data) = \frac{Pr(Data \mid Hypothesis) \cdot Pr(Hypothesis)}{Pr(Data)} $$ --- # Example: Bayesianitis .left-column12[ - A test for detecting .hlb[Bayesianitis] (a Statistics condition) has been made. This test has: - 95 % sensitivity - 98 % specificity - It is known that .hlbred[1 of 100 statisticians] has this condition. The test is given massively in different universities in Spain. What is the Probability that Laplace, who has .hlb[tested + in this test is indeed Bayesianitis +]? - Laplace has tested positive in the first test, what would be the probability to be .hlb[Bayesianitis + if Laplace tests positive again?] ] .right-column12[ </br>  ] --- # Example: Bayesianitis. Posterior .left-column8[ .hlbred[Hypothesis. Prior information] `\(B\)` is 'true condition' of Laplace: - `\(B^+\)`: Laplace is Bayesianitis + - `\(B^-\)`: Laplace is Bayesianitis - "1/100 prevalence" -> `\(Pr(B^+) = 0.01\)` .hlbred[Data. Observed and then, known] `\(+_1\)` the first test is positive. **95% sensitivity** -> `\(Pr(+_1 \mid B^+) = 0.95\)` **98% specificity** -> `\(Pr(-_1 \mid B^-) = 0.98\)` ] -- .right-column8[ .hlbred[Updated Hypothesis. Posterior information] `\begin{eqnarray} & & Pr(B^+ \mid +_1 ) = \frac{Pr( +_1 \mid B^+)Pr(B^+)}{Pr(+_1)} \\ \\ & = & \frac{Pr( +_1 \mid B^+)Pr(B^+)}{Pr( +_1 \mid B^+)Pr(B^+) + Pr( +_1 \mid B^-)Pr(B^-) } = \\ \\ & = & \frac{0.95 \times 0.1}{0.95 \times 0.1 + 0.02 \times 0.9} = 0.324 \\ \end{eqnarray}` - .hlb[32.4% of those Statisticians getting a `\(+\)` test are Bayesianitis +.] ] --- # Example: Bayesianitis. Testing again. Updating - What would be the probability to be Bayesianitis+ if Laplace tests positive again? .left-column8[ .hlbred[Hypothesis. Prior information] Prior information now is posterior of the previous process. `\(P(B^+ \mid +_1) = 0.324\)` .hlbred[Data. Observed and then, known] `\(+_2\)` the second test is positive. **95% sensitivity** -> `\(P(+_2 \mid B^+) = 0.95\)` **98% specificity** -> `\(P(-_2 \mid B^-) = 0.98\)` ] -- .right-column8[ .hlbred[Updated Hypothesis. Posterior information] `\begin{eqnarray} & & P(B^+ \mid +_2, +_1 ) = \frac{P(+_2 \mid B^+, +_1)P(B^+ \mid +_1)}{P(+_2 \mid +_1)} \\ \\ & = & \frac{P(+_2 \mid B^+) P(B^+ \mid +_1)}{P(+_2 \mid B^+, +_1)P(B^+ \mid +_1) + P(+_2 \mid B^-, +_1)P(B^- \mid +_1) } = \\ \\ & = & \frac{0.95 \times 0.32}{0.95 \times 0.32 + 0.02 \times 0.68} = 0.958 \end{eqnarray}` - .hlb[95.8% of those Statisticians getting a second `\(+\)` test are Bayesianitis +.] ] <!-- --- --> <!-- # Why is this happening? Simple statistical calculus --> --- class: inverse, center, middle, animated, rotateInUpRight # 2. Then..., what is Bayesian approach? --- # Density and probability functions .hlbpurple[Bayes Theorem] $$ Pr(Hypothesis \mid Data) = \frac{Pr(Data \mid Hypothesis) \cdot Pr(Hypothesis)}{Pr(Data)} $$ -- .hlbpurple[Bayesian Inference] $$ p(\theta \mid \boldsymbol{y}) = \frac{p(\boldsymbol{y} \mid \boldsymbol{\theta}) \cdot p(\boldsymbol{\theta})}{p(\boldsymbol{y})} = \frac{p(\boldsymbol{y} \mid \boldsymbol{\theta}) \cdot p(\boldsymbol{\theta})}{\int p(\boldsymbol{\theta}) p(\boldsymbol{y} \mid \boldsymbol{\theta})\text{d} \boldsymbol{\theta} } \propto p(\boldsymbol{y} \mid \boldsymbol{\theta}) \cdot p(\boldsymbol{\theta}) $$ - .hlb[Likelihood] `\(p(\boldsymbol{y} \mid \boldsymbol{\theta})\)`: function that takes information from the data. - .hlb[Prior distribution] `\(p(\boldsymbol{\theta})\)`: represents our previous knowledge about the parameter of interest. - .hlb[Posterior distribution] `\(p(\boldsymbol{\theta} \mid \boldsymbol{y})\)`: represent what you know after having seen the data. The basis for inference, a distribution, possibly multivariate if more than one parameter. --- # Bayesian inference. Principles - Interpretation of probability is related with our .hlb[degree of belief] about the event we are considering. - For example, in the case of rain, an 80% probability of rain simply tells us that it is more likely to rain than not to rain. -- - Information and uncertainty we have about everything unknown is expressible in .hlb[terms of probability distributions]. -- - It considers as uncertain elements of the problem not only .hlb[the data] but also .hlb[the parameters]. -- - The idea of .hlbred[repeated sampling is not used] to interpret the properties of estimators. -- - .hlb[Observed information updates knowledge about the unknown]. -- - .hlbred[Estimators disappear] and are .hlb[inferred in terms of probability distributions]. -- - .hlbred[Frequentist approach relies on sample data (present)]. .hlb[Bayesian uses also prior information (past)]. --- # Example: Scoring penalties Valencia C. F. .left-column9[ - Liga Santander is one of the famous league around the world. In this example, we use data of the last 10 seasons in order to know the chance of success `\((\pi)\)` to score a penalty for .hlb[Valencia Club de Fútbol]. ] .right-column9[  ] --- # Example: Scoring penalties Valencia C. F. .left-column7[ ### .hlb[Response variable + Data] - `\(Y = {\text{score/miss the penalty}}\)` - The model is generated by `\(Y\)` - .hlb[Bernoulli] with parameter `\(\pi\)`, i.e., `\(Y \sim Ber(\pi)\)` - .hlb[Likelihood] `$$p(\boldsymbol{y} \mid \pi) = \ell(\pi) = \pi^{k}(1-\pi)^{N-k}$$` k: times that a player score a penalty. N: total penalties in 10 seasons. ] -- .right-column7[ ### .hlb[Prior knowledge about the parameter] `\(\pi\)` - .hlb[Beta distribution] seems adecuate to model a proportion `\(\pi\)`. - After asking some experts, we end up with a 75 percentage chance to score a penalty. - We express this uncertainty using percentiles `\(per_{90} = 0.8\)` and `\(per_{50} = 0.75\)`. - The corresponding values for a and b are `\(a = 83.46\)` and `\(b = 28.05\)`. - .hlb[Prior distribution] $$p(\pi) \propto \pi^{a-1}(1-\pi)^{b-1} $$ ] --- # Graphical Model .left-column9[ .center[ .hlbpurple[Likelihood] $$p(\boldsymbol{y} \mid \pi) = \pi^{k}(1-\pi)^{N-k} $$ .hlbpurple[Prior distribution] $$p(\pi) \propto \pi^{a-1}(1-\pi)^{b-1}\, $$ $$\pi \sim \text{Beta}(a, b) $$ ] </br> .hlbred[Ellipses: variables] .hlbred[Squares: data] ] -- .right-column9[ .center[  ] ] --- # Example. Likelihood *vs* Prior .center[ ] --- # Posterior distribution. Bayesian learning process ### Estimating the probability to score a penalty .left-column8[ .hlbpurple[Likelihood] $$p(\boldsymbol{y} \mid \pi) = \pi^{k}(1-\pi)^{N-k} $$ .hlbpurple[Prior distribution] $$p(\pi) = \pi^{a-1}(1-\pi)^{b-1} $$ .hlbpurple[Posterior distribution] `\begin{eqnarray} p(\pi \mid \boldsymbol{y}) & \propto & p(\boldsymbol{y} \mid \pi) \cdot p(\pi) \\ & \propto & \pi^{k + a - 1}(1 - \pi)^{N - k + b - 1} \end{eqnarray}` $$\pi \mid \boldsymbol{y} \sim \text{Beta}(k + a, N - k + b) $$ ] -- .right-column8[ .center[ ] Let's try to understand how a prior works: https://minaya.shinyapps.io/Beta-Conjugate-Priors/ ] --- # Data *vs* prior information .center[ ] --- # Conjugate priors .left-column9[ - .hlb[Beta distribution is a conjugate prior] for the Binomial likelihood function. - If the posterior distribution `\(p(\theta \mid y)\)` is in the same probability distribution family as the prior probability distribution `\(p(\theta)\)`, the prior and posterior are then called conjugate distributions, and the prior is called .hlb[a conjugate prior for the likelihood function] `\(p(y \mid \theta)\)`. - <a href="https://www.johndcook.com/CompendiumOfConjugatePriors.pdf " style="color:blue;"> Compendium Of Conjugate Priors </a> ] .right-column9[  ] --- # Describing results: point estimators, credible interval - We obtain a .hlb[probability or a density function as a posterior]. So, we can deal with the complete distribution .left-column9[ .hlbpurple[Point estimates] - Mean, median, mode .hlbpurple[Credible intervals] - `\(100(1-\alpha)\%\)` credibility interval (CI) for a parameter `\(\theta\)` is defined as the pair of values a and b such as : `\(p(\theta \leq a \mid \boldsymbol{y}) = \alpha/2\)` and `\(p(\theta \geq b \mid \boldsymbol{y}) = 1-\alpha/2\)` ] .righ-column9[ - The `\(IC_{95\%}(\pi) = (0.65; 0.79)\)`  ] --- # Credible interval *vs* Confidence interval .left-column9[ - .hlbred[Frequentist approach]: a `\(100(1-\alpha)\%\)` confidence interval is defined such that, if the data collection process is repeated again and again, then in the long run, `\(100(1-\alpha)\)` .hlbred[of the confidence intervals formed would contain the (fixed) unknown parameter value]. - .hlb[Bayesian approach]: a `\(100(1-\alpha)\%\)` credible interval will explicitly indicate .hlb[the posterior probability that] `\(\theta\)` .hlb[lies within its boundaries]. So, we talk about credibility intervals. - The `\(IC_{95\%}(\pi) = (0.65; 0.79)\)` means that the probability for `\(\pi\)` to be between 0.65 and 0.79 is 0.95. ] .right-column9[ - Bayesian inference can also provide any probability statements about parameters. - For example, we could compute `\(p(\pi > 0.7 \mid \boldsymbol{y}) = 0.73\)`.  ] --- class: inverse, center, middle, animated, rotateInUpRight # 3. Let's predict --- # Predictions ## Prior predictive distribution - Using just the .hlb[previous information] about the population. -- - .hlbred[Before performing the experiment] one can infer about the most/least probable values to be observed. `$$p(y_{pred}) = \int{p(y_{pred} \mid \theta) p(\theta) d \theta}$$` -- ## Posterior predictive distribution - Using the .hlb[updated] information after performing the experiment. -- - Allows us to infer about the most/least probable values to be observed if we would .hlbred[repeat the experiment] in the future (in the same conditions). `$$p(y_{pred} \mid y_{obs}) = \int{p(y_{pred} \mid \theta) p(\theta \mid y_{obs}) d \theta}$$` --- #Prior *vs* Posterior predictive - Gennaro Gattuso wants to know if he can trust in their players to score the next penalty. In this figure we see what happened if we take into account just the expert knowledge or the expert knowledge and the data. .center[ ] --- # What we have learned so far - ### .hlb[ALL uncertainty] is quantified through probability distributions. -- - ### Bayes theorem (probability calculus) is the tool to .hlb[combine several sources of uncertainty]. -- - ### .hlb[Prior information is totally separated from information from the data] (use the data twice is forbidden!) -- - ### Prior information .hlb[is updated by data information into posteriors]. -- - ### .hlb[Posterior distributions combine and quantify] the information/uncertainty about the state of interest and are the pillar blocks of Bayesian inference. --- # But..., in the meantime... ### - Each team have many .hlb[players] -- ### - In the league there are different .hlb[teams] -- ### - En each country there is a different .hlb[league] -- ### - In addition to the league, there exist other .hlb[competitions]: Champions, Europa league -- ### - There exist a hierarchy -- ## .hlb[How can we model that?] --- class: inverse, center, middle, animated, rotateInUpLeft # Hierarchical Bayesian Models --- # References ### Blogs - https://towardsdatascience.com/bayesian-updating-simply-explained-c2ed3e563588 - https://medium.com/callisto-media-lab-blog/the-counter-intuitive-stats-principle-that-broke-the-enigma-code-dab6ce69d423 - https://translatingnerd.com/2018/02/08/searching-for-lost-nuclear-bombs-bayes-rule-in-action/ ### Blogs (Spanish) - http://anabelforte.com/2020/04/08/thomas-bayes/ - http://anabelforte.com/2020/07/23/un-teorema-para-el-siglo-xxi/ - http://anabelforte.com/2022/04/03/en-bayesiano-como/ - https://picanumeros.wordpress.com/2021/04/18/la-estadistica-detras-del-rescate-de-la-bomba-de-palomares/ --- # References ### Books - McGrayne, S. B. (2011). The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of C. Yale University Press. - Hoff, P. D. (2009). A first course in Bayesian statistical methods (Vol. 580). New York: Springer. - Bolstad, W. M., & Curran, J. M. (2016). Introduction to Bayesian statistics. John Wiley & Sons. - Kruschke, J. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. - .hlbred[Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin (2013). Bayesian Data Analysis. Chapman and Hall/CRC] - Christensen, R., Johnson, W., Branscum, A., & Hanson, T. E. (2010). Bayesian ideas and data analysis: an introduction for scientists and statisticians. CRC press. --- class: inverse, left, middle, animated, rotateInUpLeft </br> # Be Bayesian my friend. An introduction to Bayesian inference ## VI Jornadas Científicas de Estudiantes de la SEB </br> <font size="6"> Joaquín Martínez-Minaya, 2022-09-15 </font> </br> <a href="http://vabar.es/" style="color:white;" > VAlencia BAyesian Research Group </a> </br> <a href="https://smeg-bayes.org/ " style="color:white;"> Statistical Modeling Ecology Group </a> </br> <a href="https://giem.blogs.upv.es/" style="color:white;"> Grupo de Ingeniería Estadística Multivariante </a> </br> <a href="jmarmin@eio.upv.es" style="color:white;"> jmarmin@eio.upv.es </a> ---